Tradeoffs for nearest neighbors on the sphere

نویسنده

  • Thijs Laarhoven
چکیده

We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the spherical filters recently introduced by [Becker–Ducas–Gama– Laarhoven, SODA’16] to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity nq and update complexity nu for data sets of size n can be summarized by the following equation in terms of the approximation factor c and the exponents ρq and ρu: c √ ρq + (c 2 − 1)ρu = √ 2c2 − 1. For small c = 1 + ε, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity of approximately n1−4ε 2 . Balancing the query and update costs leads to optimal complexities of n 2−1), matching lower bounds from [Andoni–Razenshteyn, 2015] and [Dubiner, IEEE Trans. Inf. Theory 2010] and matching the asymptotic complexities previously obtained by [Andoni– Razenshteyn, STOC’15] and [Andoni–Indyk–Laarhoven–Razenshteyn–Schmidt, NIPS’15]. A subpolynomial query time complexity n can be achieved at the cost of a space complexity of the order n 2), matching the lower bound n ) of [Andoni–Indyk–Pǎtraşcu, FOCS’06] and [Panigrahy–Talwar–Wieder, FOCS’10] and improving upon results of [Indyk–Motwani, STOC’98] and [Kushilevitz–Ostrovsky–Rabani, STOC’98] with a considerably smaller leading constant in the exponent. For large c, minimizing the update complexity results in a query complexity of n 2+O(1/c4), improving upon the related asymptotic exponent for large c of [Kapralov, PODS’15] by a factor 2, and matching the lower bound n ) of [Panigrahy–Talwar–Wieder, FOCS’08]. Balancing the costs leads to optimal complexities of the order n 2−1), while a minimum query time complexity can be achieved with update and space complexities of approximately n +O(1/c) and n 2+O(1/c4), also improving upon the previous best exponents of Kapralov by a factor 2 for large n and c. For the regime where n is exponential in the dimension, we obtain further improvements compared to results obtained with locality-sensitive hashing. We provide explicit expressions for the query and update complexities in terms of the approximation factor c and the chosen tradeoff, and we derive asymptotic results for the case of the highest possible density for random data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Distributional Similarity Models: Clustering vs. Nearest Neighbors

Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for cluster-based and nearest neighbors distribution...

متن کامل

Query Sphere Indexing for Neighborhood Requests

This is an algorithm for finding neighbors for point objects that can freely move and have no predefined position. The query sphere consists of a center location and a given radius within which nearby objects must be found. Space is discretized in cubic cells. This algorithm introduces an indexing scheme that gives the list of all the cells making up the query sphere, for any radius and any cen...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater

The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...

متن کامل

The Performance of small samples in quantifying structure central Zagros forests utilizing the indexes based on the nearest neighbors

Abstract Todaychr('39')s forest structure issue has converted to one of the main ecological debates in forest science. Determination of forest structure characteristics is necessary to investigate stands changing process, for silviculture interventions and revival operations planning. In order to investigate structure of the part of Ghale-Gol forests in Khorramabad, a set of indices such as Cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.07527  شماره 

صفحات  -

تاریخ انتشار 2015